Skip to content

Enhance experiment runner with deterministic controls#193

Open
buzypi wants to merge 1 commit intokarpathy:masterfrom
buzypi:pr/workflow-runner-only
Open

Enhance experiment runner with deterministic controls#193
buzypi wants to merge 1 commit intokarpathy:masterfrom
buzypi:pr/workflow-runner-only

Conversation

@buzypi
Copy link

@buzypi buzypi commented Mar 11, 2026

Summary

This PR adds a execution workflow for autonomous experiments, replacing session-by-session program.md interpretation with a deterministic runner plus agent runbook.

What’s included

  • Add workflows/run_experiment.py as the single experiment orchestrator:
    • start, resume, status commands
    • top-level stage controls: setup, baseline, loop
    • loop sub-stage controls: propose, apply, commit, train, triage, record, decide
    • resumable checkpointing under workflows/runs/<run_id>/
    • run-id policy: <branch-slug>-rNNN
  • Add AGENTS.md runbook with explicit natural-language to command mapping for agent sessions.
  • Setup robustness:
    • auto-run uv run prepare.py when cache/tokenizer is missing (default on, opt-out via --no-auto-prepare)
    • explicit setup precondition checks before baseline/loop
  • Background training support:
    • training stages start in background by default (--background-train)
    • resume polls/continues in-flight baseline/train jobs
  • Human intervention support:
    • proposal override via run-scoped workflows/runs/<run_id>/next_proposal.json
    • proposal override via explicit --proposal-file <path> on start/resume
    • canonical override schema at workflows/schemas/proposal.schema.json
    • deterministic precedence in propose stage: --proposal-file -> next_proposal.json -> stochastic proposal -> deterministic fallback
    • consumed run-scoped proposals are archived to workflows/runs/<run_id>/consumed_proposals/iter_<NNNN>.json

Why

In long-running autonomous sessions, prose-only execution is fragile and inconsistent. Multiple complaints regarding this on X.
This PR makes runs repeatable, resumable, inspectable, and human-steerable while preserving program.md as the policy/objective layer.

@buzypi
Copy link
Author

buzypi commented Mar 11, 2026

Basic End-to-End Flow (via OpenCode)

  1. Start OpenCode in repo root:
opencode
  1. Ask it to start a run:

Type this in OpenCode: "Start running the experiment, run 5 loops"

It maps to:

python workflows/run_experiment.py start --loops 5
  1. Continue later:

Type this in OpenCode: "Run another 5 iterations"

It maps to:

python workflows/run_experiment.py resume --loops 5
  1. Check status:

The training runs happen in the background. And the agent goes into a sleep (which it auto-determines). We can interrupt it and ask it questions like: "Show the run status".

It maps to:

python workflows/run_experiment.py status

We can ask any other questions like: "Tell me about the results so far" or "What is the GPU usage" etc. We can ask it to resume its work.

Human-in-the-Loop Override

If you want to inject your own experiment idea instead of stochastic proposal generation:

Type this in OpenCode: "In the next iteration, increase the LR by 10% and keep warmup unchanged. Use this as a human proposal and run 1 loop."

Useful Stage-Control Examples (Direct)

In case you want to go headless you can do these:

Setup + baseline only:

python workflows/run_experiment.py start --only setup,baseline

Run only selected loop internals for 3 iterations:

python workflows/run_experiment.py resume --loops 3 --only loop --loop-only train,record,decide

Foreground mode (disable background training):

python workflows/run_experiment.py resume --loops 1 --no-background-train

Logs and Artifacts

Run outputs are written to:

  • workflows/runs/<run_id>/runner.log (human-readable timeline)
  • workflows/runs/<run_id>/history.jsonl (structured events)
  • workflows/runs/<run_id>/state.json (checkpoint state)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant